Introduction

Mental illness is one of the leading causes of disability in the world. In a report evaluating US Health from 1990-2010, the disease burden of mental illness is among the highest of all diseases [1]. Disease burden refers to the impact of a health problem measured by financial cost, mortality, morbidity, and other factors. Most patients with serious mental diseases or disorders spend years struggling without the ability to live a normal life. It is a big burden both for the patients, the family and our society.

New York City is facing the same problem with high rate of mental illness. Different studies have indicated possible relationship between urban life and higher risk of mental illness. In 2015, New York City launched an action plan called “thriveNYC”[2], aiming to change the way people think about mental health and provide more accessible services citywide.

The goals of my project here are:

  1. To gain insights of the mental health situation in NYC;

  2. To discover potentially effective interventions and provide guidance for the distribution of fundings and services of NYC to finally improve new yorkers’ mental health.

The data I am using is from Health Data NY, Statewide Planning and Research Cooperative System (SPARCS)[3]. The raw data includes the NY hospitalization inpatient discharges for all diseases in 2014.

References:
[1] US Burden of Disease Collaborators. The state of US health, 1990-2010: burden of diseases, injuries, and risk factors. JAMA, 310(6): 591-608, 2013.
[2] https://thrivenyc.cityofnewyork.us/
[3] https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/mpue-vn67


Dataset Exploration

  • Load the dataset of patients information with all disease hospitalization in NY 2014
df14 <- read.csv("Hospital_Inpatient_Discharges__SPARCS_De-Identified___2014.csv")
  • Subset data to find those with Mental Diseases & Disorders in NYC
    1. Subset data with patients from NYC
    2. Subset data with patients of mental diseases and disorders

    Based on the diagnosis of mental diseases and disorders, DRG code is used to extract mental diseases and disorders hospilization in the dataset. Neurological diseases and Drug & Alcohol abuses are excluded from the dataset.

    Details of DRG code used are listed here: https://github.com/super-penguin/SPARCS-health-data.

# Subset the Data
# Subset the inpatient hospilization for Mental Diseases & Disorders of NYC in 2014
# The subset is based on DRG code
Mental.Code<- c(740, 750, 751, 752, 753, 754, 755, 756, 758, 759, 760, 561, 766)
County <- c("Bronx","Kings", "Manhattan", "Queens", "Richmond")
df14.NYC <- subset(df14, Hospital.County %in% County)
df14.mental<- subset(df14.NYC, APR.DRG.Code %in% Mental.Code)

1. Compare the top10 diseases in NYC 2014

1.1. Plot top 10 diseases with highest hospitalization in NYC

  • Remove the hospitalization data of Newborn, Naginal delivery and Cesarean delivery.
    • Those 3 are not caused by diseases.
# Convert the cost and charge ($) into integer for further exploration
df14.NYC$Total.Charges<- destring(df14.NYC$Total.Charges)
df14.NYC$Total.Costs<- destring(df14.NYC$Total.Costs)
df14.mental$Total.Charges<- destring(df14.mental$Total.Charges)
df14.mental$Total.Costs<- destring(df14.mental$Total.Costs)

# Group dataset by DRG code and sum patients number for each disease 
df14.NYC.DRG.Group<- df14.NYC %>%
    group_by(APR.DRG.Code, APR.DRG.Description) %>%
    summarise(Total.Patients.Number = n()) %>%
    arrange(Total.Patients.Number)
  • Bar Plot of the Top 10 Diseases in NYC 2014

In this figure, Schizophrenia and Bipolar Disorders are both belong to mental disorders. Among the top 10 diseases, two of them are mental disorders and Schizophrenia is the third most common one. It indicates the importance of understanding mental health situation in NYC.

1.2. Plot the Number of Patients with Different Mental Diseases & Disorders by DRG Code

Schizophrenia, Bipolar Disorders and Major Depressive Disorders are the TOP 3 most common mental illnesses in NYC 2014.

1.3. Plot the Fraction of Patients Admitted Through Emergency Department for the Top 10 Diseases in NYC 2014

By comparing the emergent addmission rate of top 10 diseases in NYC, Schizophrenia and Bipolar Disorders are not the highest, but they all lie in the higher range (around 70%).**

Emergency admission rate of mental diseases & disorders implies the importance of early action on the road to improve mental health. Improving early counseling services and early responding team might be an effective way to provide patients with necessary help and prevent it from getting worse.

1.4. Compare and Plot the Total Charges of the Top 10 Diseases in NYC 2014

The averaged total charge of Schizophrenia is the third highest among the top 10 diseases. It is a huge financial burden both to the patients’ family and our city.

1.5. Compare and Plot the Length of Hospitalization of the Top 10 Diseases in NYC 2014

The hospitalization length of Schizophrenia and Biopolar Disorders are both in the higher range. Actually, this figure might not represent the actual long term burden of mental diseases. In fact, most patients still need extra care at home or specific facilities after discharging from the hospital.

1.6. Observations and Reflections

In this chapter, the severity of mental disorders are compared with the top 10 diseases in NYC on different aspects. In 2014, schizophrenia alone was already the thrid leading cause of patient hospitalization in NYC. Besides the shocking number of patients with mental problems, the high charges and long hospitalization duration are heavy burden both to the patients and our city. Most patients with severe mental disorders lose the ability to work and live by themselves for years or even a lifetime. Extra care and cost is needed constantly.

From those figures, it is clear that mental health is one of the urgent problems to our city and effective data sharing should be coordinated to come up with new strategies.

2. Discover the vulnerable groups in NYC who are more likely to suffer from mental problems

2.1. Bar Plot of Patients with different age and racial groups

  • Group patients data by age and gender
df14.fc_by_age_race <- df14.mental %>%
    filter(Race != "Multi-racial") %>%
    group_by(Age.Group, Race, Gender) %>%
    summarise(mean_days = mean(as.numeric(Length.of.Stay)),
              mean_costs = mean(as.numeric(Total.Costs)),
              n = n()) %>%
    arrange(Age.Group)
  • Data Visualization - Bar Plot

There seems to have a significant racial difference in patients with mental diseases. In order to better quantify the racial differences, the patients number will be normalized to the corresponding racial population in the next figure.

2.2. Normalizing Patients Number with the Corresponding Racial Population

  • The Estimated Population of NYC in 2014:
    • Total Population: 8405837
    • White: 33% - 2773926
    • Black/African American: 23% - 1933342
    • Other Race: 44% - 3698568

Conclusion

There is a significant racial difference for mental diseases and disorders hospitalization in NYC. The percentage of Black/African American with mental problems is almost two times compared with other races.

2.3. Bar Plot of Patients with different age and gender groups

3. Explore the temperal patterns of mental health hospitalization in NYC

3.1. Compare the day of hospitalization admission for mental diseases and disorders in NYC

The plot of hospitalization admission date has an interesting pattern. The number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.

4. Explore the relationship between total costs for mental diseases and the length of hospilization in NYC 2014

4.1. Scatterplot of the relationship between total costs and length of hospilization in NYC

In this figure, the black line is the average cost, red line is the 90% quantile and blue line is the 10% quantile. The costs for hospilization increase in an interesting pattern along with the length of stay.There are spikes for cost on day 25 for several mental diseases. When the stay is longer than 25 days, the cost increases gradually along with stay. However, on certain number of days, the costs are always low. This interesting pattern needs further investigation.

Since the most common mental illness are schizophrenia, bipolar disorders and major depressive disorders, I am going to only investigate those 3 diseases and hopefully to find a pattern for the total cost and length of stay in the next figure.

4.2. Scatterplot of the relationship between total costs and length of stay for different payment methods with the top 3 mental illnesses

The most common insurances used for mental illness are Medicaid and Medicare. There exits similar pattern in the plot of those two payment methods. I am sure there are information in other types of payment, however, I am going to investigate dataset containing only those two payment methods for the top 3 mental illnesses in the next figure.

4.3. Further exploration of the relationship between total costs and length of stay

  • Visualization if emergency admission has an effect on the relationship between total cost and length of stay

Interestingly, emergency admission does not seperate the plot at all. There is a big jump for the cost at day 25 and surprising low cost at some discrete days from 25-125. Except those discrete days with extraordinary low costs, the rest has a pretty good linear relationship between cost and length of stay. Finally, emergency admission does not affect this relationship in the figure.

4.4. Further exploration of the relation between total costs and length of stay

  • Visualization if the severity of mental illness has an effect on the relationship between total cost and length of stay

The severity of illness is defined by number from 1-4. 1 is the least serious and 4 is the most serious type. The percentages of emergency admission are higher for more severe mental illnesses (with number 3 and 4). The big jump of high cost exits in severity from 1-3, expecially for severity 2 which has the highest patient numbers.

Surprisingly, the severity of mental illness does not have a seperation for the interesting pattern either. In conclusion, all the factors (including different mental illness, different method of payment, emergency admission and different severity of illness) do not affect the pattern of total costs along with the length of hospitalization.

Based on those results, I suspect that this interesting pattern might be caused by different hospital types. However, I do not have enough information, further study and analysis need to be performed to answer this question.

Final Plots

Final plot 1 - The Racial Difference in Mental Illness of NYC 2014

Final plot 2 - The Gender and Age Difference in Mental Illness of NYC 2014

Final plot 3 - the Day of Hospitalization Admission for Mental Illness in NYC 2014

Observations and Reflections for Final Plot 1-3

There is a significant racial and gender difference for mental diseases and disorders hospitalization in NYC.

In 2014, Black/African Americans have the highest mental disorder hospitalizations in almost every age group compared with other racial groups. I researched for potential reasons. One possible explanation is genetic difference. However, I didn’t find much evidence to support this assumption. Another possible reason is the bias in mental disorder diagnosis, which means one race is more likely to be diagnosed with severe mental disorders. There are some studies showing that a Black/African American is more likely to be diagnosed as schizophrenia with the same symptom when a White American is diagnosed as depression. However, this observation does not explain my results since I grouped all those possible mental diseases together. I will explore the data further to see if I could come up with a reasonable explanation for racial difference.

The second figure indicates that people aged from 30-49 are more likely to suffer from mental problems. From thriveNYC, a lot of money and efforts are aiming to help or prevent mental problems in younger people. However, my results gave a different perspective. Middle aged people are the most vulnerable group. It might be caused by higher pressure both from family and society. In conclusion, I believe more attention should be paid to improve the mental health of middle aged group. Mental health counseling or service should be provided more widely in workplace and community.

The gender difference is another interesting observation. I was debating if I should include Maternal Depression into the total mental health data, since it might cause gender bias in the final results. However, even if I included Maternal Depression, male adults still have much higher hospilization rate with mental problems. It also indicates that work and family pressure might be one of the leading cause to induce mental problems in NYC.

Finally, the number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.

Final plot 4 - Map NYC mental diseases and disorders hospitalization into County Level

## OGR data source with driver: ESRI Shapefile 
## Source: "nybb_16c", layer: "nybb"
## with 5 features
## It has 4 fields

Map Patients Number (per 1,000 Population) with Mental Problems

“leaflet” package is used in this figure to build an interactive map. This map has 3 layers. The first layer with popup “Total Patients Number” displays patients number at each county on the map; The second layer with popup “Populations” displays total population at each county on the map; Finally, the last layer with popup “Patients per 10,000 populations” shows the patients number per 10,000 population at each county.

Observations and Reflections for Final Plot 4 - Map 1

In the five counties of new york city, Manhattan does not have the largest population, however it has the largest patients number with mental problems compared with other counties. When normalized with the county population, the difference gets more bigger. Manhattan has much more patients with mental diseases and disorders per 10,000 population compared with other counties in NYC.

This result is not surprising. The condensed population and high living pressure in manhattan might be the leading cause for this difference. Based on this observation, more fundings and services should be distributed in manhattan to improve mental health of new yorkers.

Map Avearge Costs of Mental Diseases and Disorders Hospitalization

Observations and Reflections for Final Plot 4 - Map 2

On this map, Bronx has the highest averaged cost for the hospitalization of mental diseases and disorders. This is quite surprising. Manhattan has the highest mental patients percentage but the costs for mental diseases and disorders hospitalization is less than Bronx. If you take a closer look at the legend, the average cost for hospitalization of mental diseases in Bronx is twice as much as on Staten Island.

Further information is needed to investigate the reason for this and I believe the city goverment and other facilities need to look into this problem. If necessary, action should be taken to decrease the cost for hospitalization in Bronx.

Final Reflections

This dataset is limited in many ways. First, patients hospilization infomation in this dataset does not account for the readmission. Patients with mental diseases and disorders have a high readmission rate, but when I am analyzing this dataset, the readmission rate is missing. Bias might be induced by this missing factor and interesting observation might be ignored without the consideration of readmission.

In addition, due to the confidential problem, the zip code information for patients is not complete. It only has the first 3 digits, which makes it impossible to map the mental health profile into community level. New york city is a large and ethnically diverse metropolis. Analysis on county level does not provide enough information refecting the health situation when considering the diverse demographic characteristics of each community. I will continue this project with more detailed data and hopefully to map a better NYC mental health profile.